Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: optimize argocd-application-controller redis usage #5345

Merged
merged 3 commits into from
Jan 29, 2021

Conversation

alexmt
Copy link
Collaborator

@alexmt alexmt commented Jan 28, 2021

Signed-off-by: Alexander Matyushentsev AMatyushentsev@gmail.com

The argocd-application-controller uses Redis a LOT that results in significant traffic that might be costly in the cloud. PR introduces two-level caching: before reading/writing to redis controller check in-memory cache first and don't use redis if in-memory cache already has required entry. This is safe since the controller is the only writer to the Redis.

… calls

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
@alexmt
Copy link
Collaborator Author

alexmt commented Jan 28, 2021

Testing results on a busy Argo CD instance:

The number of Redis calls reduced by ~30%:

Screen Shot 2021-01-28 at 9 57 57 AM

Sent/received traffic reduced by almost 4x:

Screen Shot 2021-01-28 at 10 01 00 AM

Controller memory usage before and after upgrade are the same:

image

@alexmt alexmt marked this pull request as ready for review January 28, 2021 19:14
@@ -303,7 +303,8 @@ export class PodView extends React.Component<PodViewProps> {
renderMenu: () => this.props.nodeMenu(rnode)
};
}

});
(tree.nodes || []).forEach((rnode: ResourceTreeNode) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug fix?

Copy link
Collaborator Author

@alexmt alexmt Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. It appears that UI might break if pod appears before the parent resource in ApplicationTree.Nodes list. After introducing node sorting in backend this reproduces consistently.

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions, please see below.

And can you please provide comments for the public methods introduced in this change?

Comment on lines 34 to 37
err := gob.NewEncoder(&buf).Encode(obj)
if err != nil {
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that we'll silently drop some real errors here? In which cases does gob.NewEncoder() fail?

Copy link
Collaborator Author

@alexmt alexmt Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, changed method so it returns an error instead of silently dropping it

if !found {
return false
}
existingBuf := bufIf.(bytes.Buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should safeguard this typecast, but I'm not sure. Can there be something else than bytes.Buffer in the in-memory cache for whatever reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to panic here, because if In-memory cache has garbage we should restart pod. Added explicit check and panic with the meaningful message


type twoLevelClient struct {
inMemoryCache *InMemoryCache
client CacheClient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great to have some consistency in the member names, i.e. inMemoryCache and externalCache. But this is just nitpicking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Renamed

@alexmt alexmt requested a review from jannfis January 29, 2021 18:56
Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
@alexmt alexmt force-pushed the two-level-caching branch from 327ce7d to d12fe5a Compare January 29, 2021 19:21
Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for awesome commentary!

LGTM.

@alexmt alexmt added the cherry-pick/1.8 Candidate for cherry picking into the 1.8 release branch label Jan 29, 2021
@alexmt
Copy link
Collaborator Author

alexmt commented Jan 29, 2021

Cherry picking to 1.8 since it is fixing unexpected traffic increase caused by #4965

@alexmt alexmt merged commit 2167082 into argoproj:master Jan 29, 2021
@alexmt alexmt deleted the two-level-caching branch January 29, 2021 19:43
alexmt pushed a commit that referenced this pull request Feb 5, 2021
* refactor: controller uses two level caching to reduce number of redis calls

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
shubhamagarwal19 pushed a commit to shubhamagarwal19/argo-cd that referenced this pull request Apr 15, 2021
…j#5345)

* refactor: controller uses two level caching to reduce number of redis calls

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick/1.8 Candidate for cherry picking into the 1.8 release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants